Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admin: Support /drain_listeners?graceful #11639

Merged
merged 12 commits into from
Jun 24, 2020
Merged

Conversation

auni53
Copy link
Contributor

@auni53 auni53 commented Jun 18, 2020

Signed-off-by: Auni Ahsan [email protected]

Calling /drain_listeners?graceful will trigger the drain manager drain sequence prior to closing listeners.

Risk Level: Low.
Testing: Tested that connections are terminated on request complete during the graceful drain period, that new connections can still be opened, H1/H2-specific response behaviour.
Docs Changes: Add docs to admin.rst, improve the overall drain sequence documentation.

@htuch htuch requested a review from mattklein123 June 18, 2020 15:57
@auni53 auni53 changed the title admin: dd ?graceful query parameter to /drain_close admin: Support /drain_close?graceful Jun 18, 2020
@auni53
Copy link
Contributor Author

auni53 commented Jun 18, 2020

  1. We'd discussed automatically graceful draining on StopListeners::All, but that actually complicates the testing because it removes the ability to force drain in the StopListeners::All case. I'd prefer to just keep this simple for now rather than adding another query parameter that overrides it. (The current testing strategy is to invoke a long graceful drain period, do the testing, then invoke a non-graceful drain to close out.)

  2. What docs/api notes are needed for query parameter changes?

@auni53 auni53 marked this pull request as ready for review June 19, 2020 15:18
@mattklein123
Copy link
Member

This this makes sense to me. You will want to add docs here:
https://www.envoyproxy.io/docs/envoy/latest/operations/admin
https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining

/wait

Signed-off-by: Auni Ahsan <[email protected]>
@auni53
Copy link
Contributor Author

auni53 commented Jun 22, 2020

Added docs, not sure if more detail is needed. I think the drain docs probably need an overhaul, but I don't really have the context with other parts of it to want to take that on.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small comments.

/wait

.. http:post:: /drain_listeners?graceful

When draining listeners, enter a graceful drain period prior to closing listeners.
This behaviour and duration is determined by server options for the drain manager, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ref link to the relevant config and CLI options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure out how to ref link to the server options, not sure if the ServerInfo proto is what you mean, but I linked the CLI options.

@@ -12,6 +12,8 @@ various events. Draining occurs at the following times:
* The server is being :ref:`hot restarted <arch_overview_hot_restart>`.
* Individual listeners are being modified or removed via :ref:`LDS
<arch_overview_dynamic_config_lds>`.
* The server begins shutdown sequence via the :ref:`drain_listeners?graceful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 39 below needs updating. Please take a general pass through this page. You have as good an understanding of this as any, and we expect contributors to care for the docs and improve them as they make changes. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I'm kind of rushing for a deadline and writing docs in markdown was a little confusing. Reorganized a lot of this doc and added more detail.

Signed-off-by: Auni Ahsan <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM with small comments.

/wait

In a few different scenarios, Envoy will attempt to gracefully shed connections. For instance,
during server shutdown, existing requests can be discouraged and listeners set to stop accepting,
to reduce the number of open connections when the server shuts down. Draining behaviour is defined
by the server options in addition to indvidiaul listener configs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo indvidiaul

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Individual listeners are being modified or removed via :ref:`LDS
<arch_overview_dynamic_config_lds>`.

By default, the Envoy server will close listeners immediately on server shutdown. To close listeners
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, the Envoy server will close listeners immediately on server shutdown. To close listeners
By default, the Envoy server will close listeners immediately on server shutdown. To drain listeners

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 34 to 36
:ref:`Redis <config_network_filters_redis_proxy>`, and
:ref:`Mongo <config_network_filters_mongo_proxy>`.
:ref:`HTTP connection manager <config_http_conn_man>`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixed up commas, period, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I think. markdown is hard to read.


const bool graceful = params.find("graceful") != params.end();
if (graceful) {
server_.drainManager().startDrainSequence([this, stop_listeners_type]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is called twice does something bad happen? Does this need to be guarded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh good catch. Added a test, added a guard, left a comment because I'm not sure what the admin response should return if you've called /drain_listeners?graceful twice. Should it silently do nothing? (Since the server's already draining...) Or return an error on the admin call? (If so what error?)

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: Auni Ahsan <[email protected]>
@auni53 auni53 changed the title admin: Support /drain_close?graceful admin: Support /drain_listeners?graceful Jun 23, 2020
Signed-off-by: Auni Ahsan <[email protected]>
Copy link
Contributor Author

@auni53 auni53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding item on what to return from the admin endpoint in case of double calls to startDrainSequence()

In a few different scenarios, Envoy will attempt to gracefully shed connections. For instance,
during server shutdown, existing requests can be discouraged and listeners set to stop accepting,
to reduce the number of open connections when the server shuts down. Draining behaviour is defined
by the server options in addition to indvidiaul listener configs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Individual listeners are being modified or removed via :ref:`LDS
<arch_overview_dynamic_config_lds>`.

By default, the Envoy server will close listeners immediately on server shutdown. To close listeners
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 34 to 36
:ref:`Redis <config_network_filters_redis_proxy>`, and
:ref:`Mongo <config_network_filters_mongo_proxy>`.
:ref:`HTTP connection manager <config_http_conn_man>`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done I think. markdown is hard to read.


const bool graceful = params.find("graceful") != params.end();
if (graceful) {
server_.drainManager().startDrainSequence([this, stop_listeners_type]() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh good catch. Added a test, added a guard, left a comment because I'm not sure what the admin response should return if you've called /drain_listeners?graceful twice. Should it silently do nothing? (Since the server's already draining...) Or return an error on the admin call? (If so what error?)


const bool graceful = params.find("graceful") != params.end();
if (graceful) {
if (server_.drainManager().drainClose()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always going to return true depending on the settings? I think you need an actual "drain sequence started" boolean?

In terms of what to do, I don't feel strongly about it as long as it's documented well. I think returning 200 OK is fine with me.

/wait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually this is going to flakily break on probabilistic drain. The underlying draining_ variable already exists, do you think I should just add a getter to the drain manager interface? Hopefully won't need to change the parent DrainDecision class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think a getter on the base interface is fine which would just say whether the drain sequence has started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: Auni Ahsan <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than missing comment, thanks.

/wait

@@ -21,6 +21,8 @@ class DrainManager : public Network::DrainDecision {
*/
virtual void startDrainSequence(std::function<void()> drain_complete_cb) PURE;

virtual bool draining() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc comments for the interface method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the quick turnaround.

Signed-off-by: Auni Ahsan <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@htuch htuch merged commit 77efb47 into envoyproxy:master Jun 24, 2020
@htuch
Copy link
Member

htuch commented Jun 24, 2020

Mac OS X CI failure seemed unrelated (looks like it was just hanging, CC @asraa as on-call).

@auni53 auni53 deleted the graceful branch June 24, 2020 13:45
songhu pushed a commit to songhu/envoy that referenced this pull request Jun 25, 2020
Calling /drain_listeners?graceful will trigger the drain manager drain sequence prior to closing listeners.

Risk Level: Low.
Testing: Tested that connections are terminated on request complete during the graceful drain period, that new connections can still be opened, H1/H2-specific response behaviour.
Docs Changes: Add docs to admin.rst, improve the overall drain sequence documentation.

Signed-off-by: Auni Ahsan <[email protected]>
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jul 24, 2020
Calling /drain_listeners?graceful will trigger the drain manager drain sequence prior to closing listeners.

Risk Level: Low.
Testing: Tested that connections are terminated on request complete during the graceful drain period, that new connections can still be opened, H1/H2-specific response behaviour.
Docs Changes: Add docs to admin.rst, improve the overall drain sequence documentation.

Signed-off-by: Auni Ahsan <[email protected]>
Signed-off-by: yashwant121 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants